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1  Work  Performed  within  This  Reporting  Period 

In  this  reporting  period,  we  performed  the  following  tasks. 

•  Further  enhanced  the  graph  database  architecture  for  modeling,  storing  and 
querying  Twitter  interaction  graphs.  We  have  developed  an  architecture  to 
ingest  Twitter  data,  convert  it  into  Twitter  interaction  graphs,  and  store  in  a  graph 
database,  namely  OrientDB. 

•  Refined  the  k-hop  neighborhood  queries  for  Twitter  interaction  graphs.  We 

have  refined  the  implementation  and  design  of  k-hop  neighborhood  queries,  i.e., 
up  to  and  including  k  distance  neighbors,  and  matured  a  web-based  visualization 
and  UI  to  interact  with  the  graph  database. 

1.1  Storing  Twitter  Interaction  Graphs  in  OrientDB 

The  first  step  in  developing  the  querying  capability  is  to  store  a  graph  efficiently.  A  graph 
database,  for  all  practical  purposes,  can  be  represented  as  the  graph  itself.  In  its  simplest 
form,  we  consider  the  Twitter  interaction  graph  modeled  as  an  undirected  graph  G  = 
( V ,  E),  where  V  is  the  set  of  nodes  (vertices)  and  E  is  the  set  of  edges.  For  a  collection  of 
tweets  (r(0)  |  9  E  Z+],  where  each  tweet  r(. )  can  be  uniquely  identified  by  its  unique 
tweet  ID  6  E  7L+ ,  let  t(9)  be  tweeted  by  user  uT^  and  let  uT^  have  retweeted, 

mentioned,  or  replied  to  Kg  users  j(r(0))  =  {v^gy  vt(G)>  ■■■  -  ut"(6»)}-  Of  course,  if  no 
retweets,  mentions,  or  replies  are  present,  then  j(r(0))  =  0.  We  can  then  unambiguously 
specify  the  Twitter  Interaction  Graph  G  with  V  =  [ur^  U  j(r(0))|  9  E  TL+)  and  E  = 
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{uT(e)  x  J(t(0)) I  e  ET+}  =  {(uT(e),  v}(e)),  (uT(0),  v?(0)), ... ,  (uT(e),  v?(0)) \e  E2+}. 

One  can  enhance  G  by  adding  other  types  of  interactions,  e.g.,  by  adding  an  edge  if  a  user 
uses  a  hashtag  in  his/her  tweet.  In  this  particular  case,  the  vertex  set  will  have  hashtags  as 
well.  In  fact,  in  this  reporting 

To  implement  the  graph  modeling  capabilities  above,  we  used  an  open  source  graph 
database,  namely  OrientDB  (http://www.orientdb.com),  and  an  SQL-like  graph  querying 
language.  OrientDB  provides  an  NoSQL  engine  that  stores  and  queries  graphs  via  both  (i) 
a  graph  database  API  and  document  API,  and  (ii)  supports  schema-less,  schema-full,  and 
schema-mixed  modes.  Our  initial  experimentation  with  OrientDB,  an  open  source  graph 
database,  and  its  SQL-like  graph  querying  language  yielded  promising  results.  In 
particular,  we  inserted  a  Twitter  Interaction  Graph  with  \V\  >  34 k  and  \E\  >  421  k  in 
less  than  6  minutes.  In  addition,  we  ordered  nodes  of  this  graph  by  degree  in  less  than  2 
seconds.  A  basic  graph  service  that  performs  basic  graph  database  management 
operations,  e.g.,  insert,  delete,  update,  is  also  implemented. 

We  have  also  designed  and  implemented  the  k-hop  neighbor  queries.  In  brief  and 
informally,  starting  from  a  node  v,  k-hop  queries  finds  the  neighbors  of  v,  neighbors  of 
neighbors  of  v  (a.k.a.  second  degree  neighbors),  and  so  on  until  k-hop  neighbors.  In 
particular,  for  a  querying  function  Q(.),  a  “2-hop”  neighbors  query  is  implemented  as 
follows. 

Q(v)  =  {(v,  v')  U  (V,  v'')\(v,  v')  E  E  and  (v',v")  E  E  for  V  v',v"  E  P}. 

This  can  be  generalized  to  A- hop  by  including  neighbors  of  neighbors  of  neighbors  up  k- 
hop. 

This  querying  service  has  also  been  implemented  using  a  Groovy/Grails  framework  with 
functions  written  in  Java. 

1.2  Visualization  and  UI 

To  visualize  the  graph  returned  by  the  queries,  we  have  further  developed  IAI’s  graph 
visualization  capabilities.  We  have  also  incorporated  a  UI  to  call  the  k-hop  query  service. 
These  capabilities  are  all  incorporated  to  Scraawl. 

Figure  1  is  a  visualization  for  4-hop  neighborhood  query  starting  from  #EasterEggRoll. 
The  Twitter  interaction  graph  has  been  created  from  a  collection  of  tweets  that  have  been 
collected  between  28  Mar,  2016  03:00  AM  and  29  Mar,  2016  02:59  using  the  keyword 
#EasterEggRoll.  Overall,  25063  tweets  were  collected.  The  graph  has  16171  nodes 
(vertices)  and  48594  edges. 

The  “Neighbors”  button  to  the  upper  left  comer  call  the  k-hop  querying  service  with  (i) 
starting  node  (in  this  case  #EasterEggRoll),  and  (ii)  the  k-parameter  (in  this  case  k  =  4). 
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Social  Graph 


Neighbors 


4  hop  traversal  starting  from  top  user 

Click  any  node  to  highlight  and  see  details.  Use  mouse  wheel  to  zoom  in/out.  Left  click  to  drag  and  pan. 

Figure  1:  4-hop  neighborhood  query  starting  from  “#eastereggroll”. 

2  Current  Problems 

None. 

3  Work  to  be  Performed  in  the  Next  Reporting  Period 

In  the  next  report  period,  we  will  focus  on  the  following  tasks: 

•  We  will  finish  implementation  of  Task  1 . 

•  We  will  deliver  Scraawl  1.13.0. 
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